Variegated spatial–temporal landscape of COVID-19 infection in England: findings from spatially filtered multilevel models

Abstract Background Although there are empirical studies examining COVID-19 infection from a spatial perspective, majority of them focused on the USA and China, and there has been a lacuna of systematic research to unpack the spatial landscape of infection in the UK and its related factors. Methods England’s spatial–temporal patterns of COVID-19 infection levels in 2020 were examined via spatial clustering analysis. Spatially filtered multilevel models (SFMLM), capturing both hierarchical and horizontal spatial interactive effects, were applied to identify how different demographic, socio-economic, built environment and spatial contextual variables were associated with varied infection levels over the two waves in 2020. Results The fragmented spatial distribution of COVID incidence in the first wave has made a rural–urban shift and resulted in a clearer north–south divide in England throughout 2020. The SFMLM results do not only identify the association between variables at different spatial scales with COVID-19 infection level but also highlight the increasing importance of spatial-dependent effect of the pandemic over time and that the locational spatial contexts also help explain variations in infection rates.


Introduction
The headline COVID-19 statistics reported by the British government have been at the national or regional level.This lack of systematic spatial-temporal analysis of the footprint of COVID-19 has crippled local responses, leading to political mistrust of government's local lockdown measures. 1any epidemiologic models have been built and applied under different contexts to predict the COVID-19 incidence and diffusion, such as the susceptible, exposed, infectedinfectious and recovered model 2 and the nonlinear smooth transition autoregressive model. 3Social scientists have tried to establish links between COVID-19 incidence with different socio-economic and environmental factors.Most empirical studies have focused on the USA or China, and there has been a lacuna of systematic research to unpack the changing variegated spatial landscape of infections in the UK.The 'Build Back Fairer: COVID- 19 Marmot Review' highlights that certain spatial settings and environmental conditions are likely to be disproportionately affected by the pandemic and thus may have a link to wider regional inequalities. 1,4Having a rigorous understanding of the spatial landscape of COVID-19 is therefore seen as important to the development of holistic strategies and measures to tackle health and spatial inequalities. 5his paper aims to bridge this critical knowledge gap by first exploring the geographic spread and the temporal change and then identify the potential explanatory factors of COVID-19 infection levels in England through spatial clustering analysis and multilevel spatial model methods.Rather than just focusing on various socio-economic and demographic factors, this study examines the often overlooked built environment and locational variables' relationship with COVID-19 infection.The two key research tasks are: (i) mapping the spatial pattern of COVID-19 infection level and identifying spatial clusters and hot spots to track their changing pattern in 2020 and (ii) examining the changing relationship between different test variables and the COVID-19 infection level over time.

Units of analysis and time periods
The core spatial units of analysis are the 6791 Middle Super Output Areas (MSOAs) where COVID-19 test data are officially published and MSOAs (MSOA is the population census geography consisting of 5000-15 000 population each.)were also aggregated to 326 local authority districts (LADs) for spatial modelling.The analytical period involved the 9-month period between 20 March 2020 and 1 January 2021.The changing patterns are examined for two time periods Time Period 1 (TP1) refers to the week ending on 20 March to the week ending on 3 July 2020; and 'Time Period 2 (TP2)' is the week ending on 10 July to the week ending on 1 January 2021 (Although the first confirmed case in England was found in late January 2020, local data of infections are only available for public downloading from March 2020.).The rationale of splitting these two time periods is to reflect the decline of COVID-19 cases in summer 2020 to mark the two waves of infection, but the precise cut-off is very much based on the pragmatic reason of capturing consistent official data for spatial comparison.Before 2 July, only the National Health Service swab test results for those with a medical need and critical key workers (Pillar one data: https://coronavirus.da ta.gov.uk/details/about-data) were released by Public Health England.Since then, the swab tests for the wider population at drive through centres and home testing kits (Pillar two data: https://coronavirus.data.gov.uk/about) were also included for daily publication.We only focus on infection levels in 2020 to control the impact and the interactive effect brought by the vaccination programme since 2021.

Variables
The key explanatory variable of this study is the COVID-19 infection level by examining the official data of 'people tested positive per 100 000 population' for spatial modelling.The lack of pillar 2 data in TP1 means that our understanding of the real gravity of COVID-19 circulation in the first wave would be constrained.It is due to this varied data compilation problem, the location quotient (LQ) of the COVID-19 positive cases was used to map the spatial clusters/outliers via calculating Local Moran's I values.The use of LQ allows us to examine the spatial concentration of COVID levels by benchmarking the proportion of cases in England as a whole.The formula of LQ is LQi = (Xi/ Xi) / (Pi/ Pi) * 100 (1)   in which X i is the number of infections in MSOAi in a particular period (TP1/TP2), X i is the sum of infection cases of all MSOAs, P i denotes the population in MSOAi and P i refers to the total population of all MSOAs.The value of 100 signifies the same infection level of England, under 100 shows less infection level than England and over 100 means above English level of infection.
A range of test variables, informed by literature and official reports, covering demographic, socio-economic, built environment and locational dimensions were included at MSOA and/or LAD level (see the supplementary document):

Spatial clustering analysis
Local patterns of spatial association were identified by mapping Local Moran's I values 6 where I i represents Local Moran's I statistic for MSOAi; x i is the LQ value for MSOAi; x j is the LQ for MSOAi's neighbouring MSOAj (i = j); x denotes the mean LQ value of all observations; W i,j is the spatial weight between MSOAi and neighbouring areas j; and S i represents the deviation value for i.The method has limitations regarding how to decide the spatial contiguity matrix objectively and how not being linked to scaling laws. 7Local Moran's I values in this study were calculated via the Mapping Clusters toolset in ArcGIS Pro.
Based on distance decay function, the spatial weight between i and j was determined by the inverse distance setting.The Local Moran's I index together with its computed z-score and P-value were used to derive four statistically significant spatial groups: (i) high-value cluster (HH); (ii) low-value cluster (LL); (iii) high-to-low value outlier (HL); and (iv) low-to-high value outlier (LH).

Spatial modelling approaches
The multilevel model (MLM), eigenvector spatial-filtered single-level model (ESFSM) and spatially filtered MLM (SFMLM) were used to examine the relationship between the test variables and COVID-19 infection rates.The application of the MLM can capture variations in the explanatory variables at the MSOA and LAD levels but not the spatial autocorrelation at each horizontal level.The ESFSM can produce a spatial signal to explain spatial autocorrelation in the residuals and to ensure the independence of the error term. 8,9To capture both vertical and horizontal spatial dependency effects of COVID-19 infection level, the spatial filtering process is introduced to a multilevel setting.The SFMLM is specified as in which the spatial filtering component δ 0 + δ 1 e 1 + . . .+ δ n e n could account for the spatial effect defined by the Moran eigenvectors; and the component u' 0j + r' ij is the white noise that is independent.The eigenvector filtering process is applied to the MSOA level.
All the model analyses were performed in the R environment by mainly referring to the 'lme4', 10 'spdep' 11 and 'spmoran' 12 packages.The Akaike Information Criterion (AIC) was applied to compare model fitness, with a lower value representing better fitness. 11,13Details of the three models could be seen in the supplementary document.

Spatial clustering of COVID-19 infection level
The spatial movement of the hotspots and outliers of COVID-19 infection LQ over the two time periods is shown in Fig. 1.While COVID hotspots (HH cluster) in TP1 were spreading across different parts of England, they tended to be found in northern England with the M62 motorway corridor as the dividing line.The most affected areas were Merseyside, Greater Manchester, South Yorkshire, Tyne and Wear and Teesside as well as parts of Lancashire and Cumbria.The remaining northern England largely felt into the LH outlier group as the entire area was vulnerable to the spread of the virus.There were a lot of areas falling into the precarious categories and could not be classified, including part of the Midlands.However, some hot spots were scattering in the Midlands with larger clusters found in the Peterborough and Huntingdonshire area and in north of Warwickshire.In southern England, a lot of small outliers with high COVID levels were surrounded by neighbouring areas with much lower infection levels; and London was the main outlying area with much higher infection levels than the adjacent areas.The only COVID hotspot in the south during TP1 was in Kent.
The notable spatial change of infection in TP2 was characterised by a north-south divide along the Severn-Wash line and an urban-rural divide.The rural-urban shift of infection was witnessed between the two waves as most outliers in the shire areas turned into cold spots, which was true in southern England as well as in Cumbria and East Ridings of Yorkshire in the north.The COVID-19 hotspot footprint (HH) in TP2 has become a mirror image of the functional urban areas, as defined by Eurostat's Urban Audit.These combined effects mean that the northern cities and their hinterland were most affected by the pandemic throughout 2020.However, some hotspots in TP1 have become cold spots (LL) in TP2 (e.g.Peterborough and Huntingdonshire area), whereas most shire areas in the South West remained to be cold spots throughout.Some HL isolated outliers in TP1 have turned to hotspots in TP2 by spreading the virus to its neighbours, with MSOAs in and surrounding Greater London as the notable examples.Kent, nonetheless, continued suffering from being the hotspot, especially with the Alpha variant found there.

Spatial modelling results of COVID-19 infection level
A comparison of AIC values of the three models for TP1 suggests that the test variables in SFMLM1, together with the multilevel and spatial dependency effects, explain 51% of variance in the COVID-19 infection rate in TP1 and that it has the highest adjusted R 2 and lowest AIC values (see Table 1).The SFMLM1, therefore, is the best-fit model for TP1.Similarly, the SFMLM2 has the best model fitness among the three models examining the infection rate in TP2 (see Table 2).Figure 2 maps the residual spatial component predicted by the eigenvector spatial filtering process of TP1 and TP2.More modelling fitness details are explained in the supplementary document.
When comparing SFMLM1 and SFMLM2 (see Tables 1  and 2), both the elderly population group (% aged ≥70) and '% usual residents living in a communal establishment' had significant and positive coefficients for both time periods.The younger aged group (% aged 16-25), however, had an inverse relationship with infection rates in TP1 but was found to be strongly and positively related to infection rate in TP2.Household size also manifested a major shift between the two waves, with '% households with four people' not significantly correlated to the infection rate in TP1 but showing a strong positive association in TP2.
Areas with high proportion of Indian and Pakistani population groups were found to be badly affected by the pandemic in both time periods with strong positive coefficients, whereas the situation with the Chinese population group was the opposite.The relationship for other ethnic groups was more mixed.The Black population group, without a significant coefficient in TP1, had a negative coefficient in TP2.The other Asian population group, found to be positively associated with infection rates in TP1, flipped to a negative coefficient in TP2, but the situation was the exact opposite for the Bangladeshi group.
Two socio-economic variables performed the same over TP1 and TP2: '% population working from home' with    and '% people in employment who are in skilled trades jobs' were negatively related to infection level.Area density, measured by '% built-up area', showed a positive but weak relationship with infection level in TP1 but the effect became insignificant in TP2.The other factor is the IMD rank, which had a significant but very weak negative effect in TP1 but the effect became insignificant in TP2.
Finally, 'Affluent England' was found with a small positive relationship with infection level in TP1, though it turned negative in TP2.The 'North West' and 'North East' were the two regions that consistently had strong positive coefficients in both models.'Yorkshire and The Humber' showed a positive but small effect in TP1 whereas the 'South East' (mainly due to the spread of the Kent variant) had a positive coefficient in TP2.The 'South West' had a negative coefficient in TP1.

Main findings
The results from spatial clustering analysis and SFMLM suggest that the original spread of the outbreak was more random.With the advance of the pandemic, the virus tended to spread outwards locally and a more spatially dependent pattern emerged in TP2.The changing spatial dependency effects captured by SFMLMs over time demonstrate the role of geography in disease transmission, particularly during the pandemic. 14This also points to the relevance of local lockdown measures implemented in 2020.Thus, a dynamic and multiscalar spatial approach is much needed to develop evidence to track the changing transmission geography to inform policymaking at different levels.
The SFMLM modelling results confirm the importance of wider regional contextual effect: the North West and North East regions are found to be significantly correlated with the infection rate throughout the pandemic.Due to the Kent variant, the South East regional effect was also significant in the second wave, though less strong than the two northern regions.On the opposite end, the South West and Affluent England are found negatively related to infection rate in TP1 and TP2, respectively.The disproportionate concentration of infections in the northern regions is largely related to their demographic and occupational compositions, but it is important to note that extra regional contextual effects are picked up by the SFMLMs.This suggests that there has been failure in controlling the spread of the virus in these regions beyond the test variables, which could be related to the interplay between central-local policy measures and resource inputs to these regions, or behavioural factors that are not captured by the models.Only Affluent England has an inverse relationship with infection level in TP2 but that mainly refers to the affluent South East hinterland of London.This confirms that except the most affluents, other areas are more vulnerable to the infection risk, especially those with a large proportion of disadvantaged groups.
While some explanatory factors became less important between the two waves, others became more influential.There were strict national lockdown measures in TP1, but only national restrictive measures and local lockdown were in place during TP2.This means that the changing explanatory power of different factors was an interactive outcome with changing government measures.One notable change was the local transmission effect in TP2 and hence large household size came into play, so were the younger population groups who returned to schools and universities.Another change was associated with the IMD rank, which suggested more deprived areas tended to observe higher infection levels at the early stage of the pandemic, though this effect has washed away.The wider circulation across schools and colleges could have dampened the deprivation effect and the government's local measures and other factors might have taken effects too.With national lockdown in TP1, the risk of exposure across different occupation groups was insignificant; however, those in managerial and skilled occupations who are more adaptable to online working had a negative relationship with infection rates in TP2.This highlights the need to have workplace mitigation measures, especially those who require close personal contact.

What has already been known on this topic?
Commonly reported test variables in relation to the COVID-19 cases are demographic variables, such as population, gender and age structure, with the inclusion of household size in some studies. 15,16Different socio-economic groups have also been reported to possess variated risks of exposure widely. 179][20] Population movement and population density have also been reported to influence the spatio-temporal distribution of COVID-19 cases. 15,21,22Built environment attributes, such as commercial vitality, transport density and network accessibility, are found directly or indirectly related to COVID-19 spread in different contexts. 23,24However, mixed findings have been uncovered under different contexts.

What this study adds?
This study provides insights on the spatio-temporal landscape of COVID-19 infection in England.Besides various socioeconomic and demographic factors, this study also examines the often-overlooked built environment and locational variables' relationship with COVID infection.This study adopts an SFMLM approach that captures both hierarchical and horizontal spatial interactive effects to examine how different variables at the MSOA and LAD levels are correlated with COVID-19 infection.This goes beyond the common use of either single-level spatial models or multilevel (nonspatial) models.Most health related variables exhibit spatial dependency, whereas health outcome data are characterised by a hierarchical structure.The application of SFMLM in this study provides useful insights for future studies under different contexts.

Fig. 1
Fig. 1 Spatial cluster analysis of LQ of COVID-19 infection rate in TP1 (a) and TP2 (b).Contains data from OS data and coronavirus data (https://coronavirus.data.gov.uk/)@Crown copyright and database right.

Fig. 2
Fig. 2 Residual spatial component via eigenvector spatial filtering of SFMLM 1 and SFMLM 2. Contains data from OS data and coronavirus data (https://coronavirus.data.gov.uk/)@Crown copyright and database right.
of the LQ of COVID-19 infection rate to detect local clusters and outliers.The equation for calculating Local Moran's I statistic is

Table 1
Estimation results on the Covid-19 infection rate in TP1 negative coefficient and '% population who work in the health sector' having positive coefficients.None of the other occupational variables showed any significant relationship with infection level in TP1.The situation changed in TP2, as areas with large proportion of high-end jobs, '% people in employment who are managers, directors and senior officials'

Table 2
Estimation results on the Covid-19 infection rate in TP2